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LONG-TERM  GOALS 

To  apply  optimal  data  assimilation  techniques  to  ocean  circulation  models  in  order  to  improve  short- 
range  prediction  of  mesoscale  circulation. 

OBJECTIVES 

The  immediate  scientific  objective  of  this  research  project  is  to  develop  a  data  assimilation  system, 
based  on  ensemble  Kalman  filter  (EnKF)  techniques,  and  to  apply  this  system  to  a  realistic  eddy¬ 
resolving  ocean  circulation  model. 

APPROACH 

The  basic  elements  of  an  ensemble-based  data  assimilation  system  include  a  system  for  collating  and 
preparing  observations,  combining  observations  with  a  model  now-cast,  initializing  and  running  an 
ensemble  of  forecasts  and  estimating  model  and  observation  errors.  Each  of  these  components  are 
currently  under  development  and  are  being  systematically  tested  on  a  suite  of  models  ranging  from  a 
simple  linear  model  (Evensen  2004),  to  a  small  highly  non-linear  model  (Lorenz  and  Emmanuel  1998), 
and  finally  to  an  idealised  and  realistic  configuration  of  an  ocean  general  circulation  model  (MOM4.0; 
Griffies  et  al.  2004). 

Under  this  project,  we  have  explored  the  rationale  for  different  ensemble-based  assimilation  algorithms 
and  techniques;  and  compared  the  performance  of  different  filters  for  a  suite  of  small  models  (Oke  et 
al.  2006;  Sakov  et  al.  2006).  Subsequently,  we  have  developed  a  new  formulation  for  the  EnKF  that  we 
refer  to  as  the  deterministic  EnKF  (DEnKF;  Sakov  and  Oke  2006). 

We  have  begun  to  assess  some  of  the  above-mentioned  developments  through  their  application  to  an 
ocean  general  circulation  model.  The  assimilation  system  we  have  used  for  this  is  the  Bluelink  Ocean 
Data  Assimilation  System  (BODAS)  that  has  been  developed  by  the  P.I.  under  a  closely  related  project 
called  BLUElink  (see  Related  Projects  for  details).  BODAS  is  a  modular  ensemble-based  data 
assimilation  system  that  computes  an  analysis  of  the  three-dimensional  ocean  circulation  by  combining 
the  relevant  elements  of  an  ensemble  of  anomalies  of  model  variables,  a  model  forecast  and  a  range  of 
observations.  To  date,  an  ensemble  optimal  interpolation  (EnOI;  Oke  et  al.  2002)  version  of  BODAS 
has  been  applied  to  the  Ocean  Forecasting  Australia  Model  (OF AM;  Schiller  et  al.  2006),  a  global 
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configuration  of  MOM4.0,  with  eddy-resolving  resolution  around  Australia.  BODAS  and  OF  AM  have 
been  combined  and  implemented  into  the  Ocean  Model,  Analysis  and  Prediction  System 
(OceanMAPS)  that  is  currently  undergoing  operational  trials  at  the  Australian  Bureau  of  Meteorology. 
Under  this  project  we  have  developed  an  EnKF  version  of  BODAS.  This  system  is  currently  being 
evaluated. 

Key  individuals : 

Dr.  Peter  Oke  is  the  P.I.  on  this  project  and  leads  the  data  assimilation  group  at  CSIRO  Marine  and 
Atmospheric  Research  (CMAR).  Dr.  Oke  is  responsible  for  the  development  of  BODAS  under  the 
BLUElink  project. 

Dr.  Pavel  Sakov  is  currently  investigating  algorithms  for  ensemble  data  assimilation  and  methods  for 
optimal  array  design  under  a  related  project. 

Dr.  Stuart  Corney  was  hired  by  CMAR  in  April  2004  to  conduct  research  under  this  project.  Dr  Comey 
has  developed  a  prototype  EnKF  system  based  on  BODAS  and  has  applied  it  to  a  regional 
configuration  of  MOM4.0. 

The  US-based  researchers  involved  in  this  project  are  Dr.  H.  Ngodock,  Dr.  G.  Jacobs  and  Dr.  R. 

Miller.  All  have  a  long  history  of  research  in  the  area  of  ocean  data  assimilation. 

WORK  COMPLETED 

We  have  completed  a  study  that  compared  the  performance  of  an  EnKF  with  perturbed  observations 
(Burgers  et  al.  1998)  and  EnOI  when  applied  to  a  two-variable  linear  system  that  is  similar  to  that  of 
Evensen  (2004).  In  this  study  we  have  investigated  the  impact  of  localisation  (Houtekamer  and 
Mitchel  2001;  Hamill  et  al.  2001)  on  a  filter’s  performance  and  on  the  dynamical  balances  that  are 
inherent  in  the  model  (Oke  et  al.  2006). 

We  have  explored  the  theoretical  basis  for  different  formulations  of  the  ensemble  square-root  filters 
(ESRF;  e.g.,  Tippett  et  al.  2003)  and  shown  that  only  a  sub-set  of  these  algorithms  work,  in  practice,  as 
they  are  intended  to  work  in  theory.  Additionally,  through  a  series  of  numerical  experiments  we  have 
compared  the  performance  of  different  filters  when  applied  to  two  small  models  (Sakov  et  al.  2006). 

We  have  proposed  a  new  formulation  for  the  EnKF  that  combines  the  simplicity  and  flexibility  of  the 
traditional  EnKF  with  the  robustness  and  superior  performance  of  the  ESRF  (Sakov  and  Oke  2006a). 
We  call  this  formulation  the  DEnKF. 

A  prototype  EnKF  system,  based  on  BODAS,  has  been  developed  and  applied  to  an  idealised 
configuration  of  MOM4.0  that  models  the  mesoscale  ocean  circulation  in  Bass  Strait  (between  Victoria 
and  Tasmania,  Australia).  This  prototype  system  is  currently  being  evaluated. 


RESULTS 


Impacts  of  localisation  in  EnOI  and  EnKF: 

The  performance  of  an  inexpensive,  EnOI  scheme  that  uses  a  stationary  ensemble  of  model  anomalies 
to  approximate  forecast  error  covariances,  is  compared  with  that  of  an  EnKF  with  perturbed 
observations.  The  model  to  which  the  methods  are  applied  is  a  pair  of  “perfect”  one-dimensional, 
linear  advection  equations  for  two  related  variables.  While  EnOI  is  sub-optimal,  it  can  give  results  that 
are  comparable  to  those  of  the  EnKF  (Figure  1).  The  computational  cost  of  EnOI  is  typically  about  1/N 
times  that  of  the  EnKF,  where  N  is  the  ensemble  size.  We  suggest  that  EnOI  may  provide  a  practical 
and  cost-effective  alternative  to  the  EnKF  for  some  applications  where  computational  cost  is  a  limiting 
factor.  We  demonstrate  that  when  the  ensemble  size  is  smaller  than  the  dimension  of  the  model’s  sub¬ 
space,  both  the  EnKF  and  EnOI  may  require  localisation  around  each  observation  in  order  to  eliminate 
effects  of  sampling  error  and  to  increase  the  effective  number  of  independent  ensemble  members  used 
to  construct  an  analysis.  However,  we  find  that  localisation  can  degrade  an  analysis  if  the  length  scales 
of  the  localising  function  are  too  short.  We  demonstrate  that  as  the  length-scale  of  the  localising 
function  is  decreased,  localisation  can  significantly  compromise  the  model’s  dynamical  balances.  We 
also  find  that  localisation  artificially  amplifies  high  frequencies  for  applications  of  the  EnKF.  Based  on 
our  experiments,  for  applications  where  localisation  is  necessary,  the  length-scales  of  the  localisation 
should  be  larger  than  the  decorrelation  length-scales  of  the  variables  being  updated. 


Time 


EnKF:  N=100 
EnOI:  N=100 
EnKF:  N=50 
EnOI:  N=50 
EnKF:  N=20 
EnOI:  N=20 
EnKF:  N=10 
EnOI:  N=10 


Figure  1:  Root-mean-squared  error  (RMSE)  for  a  linear  advection  model  versus  time,  for  an  EnKF 
and  EnOI  with  different  ensemble  sizes,  N  (ranging  from  10-100)  with  covariance  localisation; 
showing  that  while  EnOI  is  less  optimal  than  EnKF  it  can  produce  comparable  results  (N.B.  EnOI 
costs  less  than  1/N  times  that  of  the  EnKF);  adapted  from  Oke  et  al.  (2006). 


Comparison  of  ensemble-based  filters: 


ESRFs  update  an  ensemble  of  forecasts  in  two  steps.  In  the  first  step,  the  ensemble  mean  is  updated 
using  the  analysis  equation  from  Kalman  filter  theory.  In  the  second  step,  the  ensemble  anomalies 
(perturbations)  are  transformed  (updated)  so  that  their  covariance  matches  the  theoretical  analysis  error 
covariance  from  Kalman  filter  theory.  There  are  a  number  of  algorithms  for  this  transformation  (e.g., 


Tippet  et  al.  2003;  Evensen  2004).  The  most  efficient  of  these  ESRFs  is  the  Ensemble  Transfonn 
Kalman  Filter  (ETKF;  Bishop  et  al.  2001).  We  demonstrate  that  only  a  subset  of  ESRF  algorithms  that 
are  currently  employed  preserve  the  ensemble  mean  (from  step  one).  As  a  result,  the  ensembles 
covariance  no  longer  matches  the  theoretical  values  (in  step  two),  as  ESRF  theory  intends.  This 
renders  the  filter  sub-optimal.  We  demonstrate  the  significance  of  this  through  a  series  of  experiments 
with  two  small  models;  a  linear  advection  model  (Evensen  2004;  a  simple,  stationary,  linear  model) 
and  the  Lorenz-40  model  (Lorenz  and  Emmanuel  1998;  a  complex,  highly  non-linear  model  that  is 
often  used  to  evaluate  data  assimilation  schemes  (e.g.,  Whitaker  and  Hamill  2002;  Ott  et  al.  2004).  No 
inflation  (where  one  artificially  increases  the  ensemble  spread  about  the  mean)  was  used  for  the  linear 
advection  model;  but  a  range  of  inflations  (from  0-10%)  were  tested  for  the  Lorenz-40  model.  Figure  2 
shows  the  root-mean-squared  error  (RMSE)  versus  ensemble  size  for  both  models.  For  the  Lorenz-40 
model,  the  results  are  shown  for  the  inflation  that  gave  the  smallest  RMSE  for  a  given  ensemble  size; 
from  Figure  3,  where  RMSE  is  plotted  as  a  function  of  ensemble  size  and  inflation  factor.  Clearly,  the 
ESRFs  that  use  a  mean-preserving  transformation  (i.e.,  symmetric  and  mean-preserving  random 
rotations  (RR))  are  superior  to  the  other  ESRFs  and  the  traditional  EnKF.  The  foot-print  of  the 
contours  in  Figure  3  is  an  indication  of  the  robustness  of  each  filter  (white  squares  indicate  filter 
divergence);  and  smaller  inflation  is  generally  desirable.  The  mean-preserving  ESRFs  are  the  most 
robust,  most  accurate  and  require  the  least  inflation. 

While  Wang  et  al.  (2004)  reported  a  “small  improvement”  using  a  mean-preserving  filter  over  a  non- 
mean-preserving  filter  in  an  atmospheric  general  circulation  model,  we  suggest  that  their  chosen  metric 
(an  energy  norm)  for  assessing  the  performance  of  different  filters  was  inappropriate.  As  a  result,  we 
suggest  that  Wang  et  al.  (2004)  understate  of  the  importance  of  using  a  mean-preserving 
transfonnation.  A  number  of  recent  studies  have  used  non-mean-preserving  solutions  (e.g.,  Evensen 
2004;  Leeuwenburgh  2005;  Leeuwenburgh  et  al.  2005;  Torres  et  al.  2006),  indicating  that  the 
importance  of  preserving  the  mean  during  the  ensemble  transfonnation  is  not  generally  appreciated. 
Based  on  both  our  understanding  of  the  ESRF  theory  and  our  numerical  experiments,  we  argue  that 
only  mean-preserving  solutions  should  be  used  in  practical  data  assimilation. 


Figure  2:  RMSE  versus  ensemble  size  for  the  linear  advection  model  (left;  with  no  inflation;  model 
dimension  =  51)  and  for  the  Lorenz-40  model  (right;  for  the  “best”  inflation  from  Figure  3;  model 
dimension  =27)  for  different  ensemble-based  filters  (mean  preserving  filters  include  ETKF, 
Symmetric  and  ETKF,  mean-preserving  RR).  The  superior  performance  of  the  mean-preserving 
ESRFs  is  evident  by  smaller  RMSE  for  all  ensemble  sizes;  adapted  from  Sakov  et  al.  (2006). 
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Figure  3:  RMSE  versus  ensemble  size  and  inflation  factor  for  the  Lorenz-40  model  for  different 
fdters  showing  the  superior  performance  of  the  mean-preserving  filters  (ETKF,  symmetric  and 
ETKF,  mean-preserving  random  rotations)  and  the  DEnKF;  experiments  that  do  not  converge  and 
white;  adapted  from  Sakov  et  al.  (2006)  and  Sakov  and  Oke  (2006). 

A  deterministic  EnKF: 

We  propose  a  new  formulation  for  the  EnKF  that  boasts  the  performance  and  robustness  of  the  ESRF 
with  the  simplicity  and  versatility  of  the  EnKF.  We  refer  to  this  filter  as  the  Detenninistic  EnKF 
(DEnKF).  Following  the  original  proposition  of  the  EnKF  (Evensen  1994),  Burgers  et  al.  (1998)  noted 
that  a  straightforward  implementation  of  the  EnKF  results  in  a  premature  ensemble  collapse.  They 
proposed  the  use  of  perturbed  observations  as  a  solution.  This  has  generally  become  the  standard  for 
the  EnKF;  and  is  typically  employed,  along  with  covariance  localisation  for  practical  applications 
(e.g.,  Houtekamer  and  Mitchel  2001).  However,  the  EnKF  with  perturbed  observations  can  become 
problematic  when  a  small  ensemble  is  used  because  of  sampling  error.  This  has  led  some  researchers 
to  employ  deterministic  ESRFs  (e.g.,  Tippett  et  al.  2003;  Evensen  2004)  that  do  not  suffer  from 
sampling  error  and  are  therefore  more  robust;  particularly  when  a  small  ensemble  is  used.  However, 
we  note  that  the  inclusion  of  covariance  localisation  with  the  ESRF  requires  observations  to  be 
assimilated  serially,  one  at  a  time,  which  becomes  computationally  inefficient  when  many  observations 


are  used.  The  DEnKF,  proposed  by  Sakov  and  Oke  (2006),  boasts  a  performance  that  is  comparable  to 
the  ESRFs  (for  all  cases  we’ve  considered),  but  readily  permits  localisation,  like  the  EnKF.  We’ve 
shown  that  without  perturbed  observations,  a  straightforward  implementation  of  the  EnKF  yields  an 
analysis  error  covariance  that  can  be  expressed  as: 

Pa  =  Pf  -2 KHPf  +KHPfHTK\  (1) 

where  P  is  the  error  covariance  and  superscripts  a  and / denote  analysis  and  forecast  respectively;  K  is 
the  Kalman  gain  and  PI  is  the  observation  operator.  By  neglecting  the  quadratic  tenn  in  (1),  which  is 
valid  when  K  is  small,  we  note  that  the  gain  is  twice  as  large  as  it  should  be  to  match  the  theoretical 

analysis  error  covariance,  ^  ~  KH)P  .  We  therefore  suggest  a  simple  alternative  to  the  traditional 
EnKF,  where  the  ensemble  mean  is  updated  using  the  standard  analysis  equation  from  Kalman  filter, 
just  like  an  ESRF;  and  the  ensemble  anomalies  are  also  updated  using  the  analysis  equation  from  the 
Kalman  filter,  but  with  half  the  Kalman  gain: 

Aa  =Af  - -KHAf 

2  ,  (2) 

where  .4  denotes  an  ensemble  of  anomalies.  Using  this  formulation,  the  ensemble  analysis  error 

(/  -  KH)Pf  +  -KHPfHTKT 

covariance  approximately  matches  the  theoretical  values,  4 

A  comparison  between  an  ESRF  with  a  mean-preserving  transfonnation  (see  ETKF,  symmetric  and 
ETKF,  mean-preserving  random  rotations  in  Figure  3)  and  the  DEnKF  for  the  Lorenz-40  model  is 
shown  in  Figure  3.  We  find  that  the  DEnKF  performs  almost  as  well  as  the  mean-preserving  ESRFs; 
and  that  it  is  clearly  superior  to  the  EnKF  with  perturbed  observations,  particularly  for  small  ensemble 
sizes. 

The  DEnKF  has  many  practical  advantages  over  both  the  ESRF  and  the  traditional  EnKF.  The  DEnKF 
retains  the  simplicity  and  versatility  of  the  traditional  EnKF.  That  is,  it  can  be  developed  and 
implemented  relatively  easily;  and  it  readily  permits  the  use  of  localisation  (Houtekamer  and  Mitchel 
2001;  Hamill  et  al.  2001)  like  the  EnKF.  By  contrast,  the  ESRF  does  not  readily  permit  localisation,  as 
noted  above.  It  can  be  shown  that  the  DEnKF  is  a  linearization  of  the  ESRF;  and  is  deterministic. 
Consequently,  it  does  not  suffer  from  the  same  sampling  error  that  can  degrade  the  performance  of  the 
EnKF  when  small  ensemble  are  used.  Compared  to  the  ESRF,  the  DEnKF  will  always  over-estimate 
the  ensemble  spread,  rather  than  under-estimating  it.  This  means  that  while  the  filter  may  not  converge 
as  quickly  or  quite  perform  as  optimally  as  an  ESRF  (Figure  3),  it  will  be  more  conservative,  and  more 
“forgiving”  of  mis-specified  errors;  and  is  not  as  susceptible  to  ensemble  collapse  or  filter  divergence 
as  ESRFs.  We  argue  that  for  any  practical  data  assimilation  system,  the  DEnKF  offers  a  near-optimal 
and  computationally  efficient  alternative  to  the  ESRF. 

IMPACT/APPLICATIONS 

A  new  version  of  the  EnKF  has  been  proposed,  here  referred  to  as  the  DEnKF.  This  filter  boasts  the 
computational  simplicity  and  efficiency  of  the  traditional  EnKF;  but  performs  almost  as  well  as  the 
more  complicated  ESRFs.  The  DEnKF  has  the  advantage  over  the  ESRF  that  it  readily  pennits 
covariance  localisation  that  is  typically  required  for  most  practical  applications.  We  argue  that  the 
DEnKF  should  be  considered  for  any  application  of  an  ensemble-based  assimilation  system. 


Based  on  our  research,  we  argue  that  if  an  ESRF  is  to  be  used;  then  only  mean-preserving 
transformations  should  be  used  in  practice.  All  other  ESRFs  are  sub-optimal. 

This  research  involves  the  development  of  a  data  assimilation  tool  that  will  be  suitable  for  a  variety  of 
ocean  models  and  configurations.  We  are  particularly  focused  on  making  this  tool  useful  for  short- 
range  prediction  of  mesoscale  variability.  We  anticipate  that  this  tool  will  be  valuable  for  all  agencies 
involved  in  this  project. 

RELATED  PROJECTS 

The  BLUElink:  Ocean  Forecasting  Australia  project  (http://www.marine.csiro.au/bluelink/)  is  a 
partnership  between  the  CSIRO  Wealth  from  Oceans  Flagship  Program,  the  Australian  Bureau  of 
Meteorology  and  the  Royal  Australian  Navy.  An  EnOI  version  of  BODAS  has  been  developed  and 
applied  to  OF  AM  to  perform  the  Bluelink  ReANalysis  (BRAN)  that  spans  1992-2004.  BRAN 
assimilates  altimetric  and  coastal  sea-level,  and  in  situ  temperature  and  salinity  observations  from  all 
available  sources  (e.g.,  Argo,  WOCE,  TAO,  SOOP  XBT).  Comparison  with  independent  observations 
indicates  that  BRAN  successfully  represents  the  true  history  of  the  near-surface  mesoscale  circulation 
around  Australia  for  the  past  decade.  For  example,  Figure  4  demonstrates  excellent  agreement  between 
the  paths  taken  by  un-assimilated  satellite-tracked  surface  drifters  and  the  BRAN  sea-level  anomaly 
fields  off  eastern  Australia  during  2000.  BRAN  uses  an  EnOI  version  of  BODAS  with  covariance 
localisation.  This  experiment  indicates  that  a  simple  EnOI  system  can  constrain  a  realistic,  eddy¬ 
resolving  ocean  general  circulation  model  (Oke  et  al.  2005;  Schiller  et  al.  2006).  We  therefore  argue 
that  this  approach  is  suitable  for  operational  ocean  prediction. 
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Figure  4:  A  sequence  of  reanalysed  sea-level  anomalies  (monthly  means)  off  eastern  Australia 
from  BRAN  and  surface  drifter  paths  for  the  1  month  period  centered  at  the  specified  date  in 
2000.  Negative  (cyclonic)  anomalies  are  blue;  positive  (anti-cyclonic)  anomalies  are  red;  zero  is 

white;  adapted  from  Oke  et  al.  (2005). 

Research  on  optimal  array  design  and  the  improved  ensemble  data  assimilation  algorithms  have  also 
been  supported  by  the  CSIRO  Wealth  from  Oceans  Flagship  Program  (http://www.csiro.au/csiro/ 
channel/pchba„.html).  A  key  element  of  this  program  is  the  enhancement  of  our  ocean  forecasting 
capabilities.  Under  this  project  we  have  developed  a  method  for  designing  an  optimal  observation 
array  using  ensemble-based  data  assimilation  theory  (Oke  and  Schiller  2006;  Oke  and  Sakov  2006a); 
and  developed  a  method  for  estimating  representation  error  for  oceanic  observations  (Oke  and  Sakov 
2006b).  Much  of  this  research  has  strong  synergies  with  the  project  that  is  the  subject  of  this  annual 
report. 
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